Data encryption detection

ABSTRACT

In some examples, a system applies an inline detection of a write of data in a storage, the inline detection to detect potential data encryption of the data. In response to an indication of the potential data encryption, the system creates a first object that represents a first version of the data, and applies a further analysis to determine whether the potential data encryption constitutes unauthorized data encryption, the further analysis based on the first object and a second object that represents a second version of the data that is prior to the first version of the data.

BACKGROUND

A ransomware attack involves encrypting data on a computer or onmultiple computers connected over a network. In a ransomware attack,data can be encrypted using an encryption key, which renders the datainaccessible by users unless a ransom is paid to obtain the encryptionkey. A ransomware attack can be highly disruptive to enterprises,including businesses, government agencies, educational organizations,individuals, and so forth.

BRIEF DESCRIPTION OF THE DRAWINGS

Some implementations of the present disclosure are described withrespect to the following figures.

FIGS. 1 and 2 are block diagrams of arrangements including variouscomponents for detecting unauthorized encryption of data, according tosome examples.

FIG. 3 is a block diagram of a storage medium storing machine-readableinstructions according to some examples.

FIG. 4 is a block diagram of a system according to some examples.

FIG. 5 is a flow diagram of a process according to some examples.

Throughout the drawings, identical reference numbers designate similar,but not necessarily identical, elements. The figures are not necessarilyto scale, and the size of some parts may be exaggerated to more clearlyillustrate the example shown. Moreover, the drawings provide examplesand/or implementations consistent with the description; however, thedescription is not limited to the examples and/or implementationsprovided in the drawings.

DETAILED DESCRIPTION

In the present disclosure, use of the term “a,” “an,” or “the” isintended to include the plural forms as well, unless the context clearlyindicates otherwise. Also, the term “includes,” “including,”“comprises,” “comprising,” “have,” or “having” when used in thisdisclosure specifies the presence of the stated elements, but do notpreclude the presence or addition of other elements.

A ransomware attack can be difficult to detect. By the time anenterprise becomes aware of the attack, most or all of the data has beenencrypted and thus inaccessible. A ransomware attack can be difficult todetect because normal computer operations may also encrypt data, so thatdistinguishing between authorized and unauthorized encryption of datacan be challenging.

Enterprises may attempt to protect themselves from ransomware attacks bybacking up their data to backup storage systems. However, ransomwareattacks often first attack a backup storage system to encrypt data onthe backup storage system, before encrypting data on computer(s), sothat both data in the backup storage system and on the computer(s)become inaccessible.

Although reference is made to ransomware attacks in some examples, it isnoted that there may be other sources of unauthorized data encryption inother examples, either caused by malware or other unauthorized entities(humans, programs, or machines). An “unauthorized data encryption”refers to a data encryption in which data has been encrypted by anyentity that is not allowed to or supposed to perform the encryption.

Anti-malware programs rely on signatures of malware to detect whetherthe malware is present in a computer. However, an anti-malware programmay attempt to remove malware after the malware has already infected thecomputer. Anti-malware programs may not be able to detect the presenceof a ransomware attack, or may detect the ransomware attack too late inthe process to prevent damage and significant loss of data.

In accordance with some implementations of the present disclosure,unauthorized data encryption activity is detected using multi-stage dataencryption detection, which performs an inline detection to detectpotential encryption of data in writes to a storage (e.g., containing alog or other repository of data), and in response to detecting thepotential encryption of data, performs a further analysis to confirmwhether the potential encryption of data detected by the inlinedetection constitutes an unauthorized data encryption. In some examples,the further analysis includes an object analysis of multiple objectsincluding a first object created in response to detecting the potentialencryption of data, and a second object created prior to the detectionof the potential encryption of data, where the first and second objectsrepresent different versions of the data. In further examples, thefurther analysis includes a pattern analysis of a pattern ininput/output (I/O) operations. Details of the object analysis andpattern analysis are discussed further below.

FIG. 1 is a block diagram of an example arrangement that includes aninline detector 102, a pattern analyzer 104, and an object analyzer 106.Although depicted as three separate components, it is noted that in someexamples, any two or more of the components 102, 104, and 106 can becombined into fewer components, or can be separated into additionalcomponents.

Also, in other examples, the pattern analyzer 104 or the object analyzer106 can be omitted.

Each of the components 102, 104, and 106 can be implemented using ahardware processing circuit (or multiple hardware processing circuits),which can include any or some combination of a microprocessor, a core ofa multi-core microprocessor, a microcontroller, a programmableintegrated circuit, a programmable gate array, or another hardwareprocessing circuit. Alternatively, each of the components 102, 104, and106 can be implemented using a combination of a hardware processingcircuit (or multiple hardware processing circuits) and machine-readableinstructions (software and/or firmware) executable on the hardwareprocessing circuit(s).

The inline detector 102, the pattern analyzer 104, and the objectanalyzer 106 can be part of the same computer system, or alternatively,can reside on multiple computer systems. In some cases, the inlinedetector 102, the pattern analyzer 104, and the object analyzer 106 canbe present in disparate geographical locations.

As shown in FIG. 1 , a requester 108 can issue requests to access (reador write) data in a storage 130. As used here, a “storage” can beimplemented using a collection of storage devices, which can include asingle storage device or multiple storage devices. As used here, a“collection” of elements can refer to a single element or multipleelements.

A “storage device” can refer to a disk-based storage device, asolid-state drive, a memory device, and/or any other type of componentthat is capable of storing data.

The requester 108 can refer to a user, a program, or a machine (e.g., acomputer, a smartphone, or any other type of electronic device). Aprogram can execute in an electronic device. A user can use anelectronic device. In some examples, the requester 108 can include avirtual machine (VM). A VM emulates a physical machine and executes inan environment provided by a virtual machine monitor (VMM) orhypervisor. The VMM or hypervisor virtualizes physical hardware of aphysical computer system for use by VM(s) in the physical computersystem.

The requester 108 can send read and write requests to a storage system100. The storage system 100 includes a request processing engine 112 toprocess requests issued by requesters, including the requester 108. An“engine” can refer to a hardware processing circuit (or multiplehardware processing circuits) or a combination of a hardware processingcircuit (or multiple hardware processing circuits) and machine readableinstructions.

In some examples, the request processing engine 112 can include astorage controller that can respond to access requests by managing readand write access of the storage 130. As another example, the requestprocessing engine 112 can be a server computer (or a collection ofserver computers) that can respond to access requests from requesters bysubmitting corresponding requests to a storage controller, such as overa network, including a local area network (LAN), a wide area network(WAN), a storage area network (SAN), or any other type of network. Inexamples where the request processing engine 112 includes servercomputer(s), the request processing engine 112 can be outside of thestorage system 100.

In response to a read request from the requester 108, the requestprocessing engine 112 causes a read of data 132 stored in the storage130. In response to a write request from the requester 108, the requestprocessing engine 112 causes a write of data to the storage 130.

In some examples, the request processing engine 112 can include areplication logic 140 that is to protect data of a requester, such asthe requester 108. Protecting data can refer to protecting the data fromloss due to a failure or another fault, such as in any part of thestorage system 100 (e.g., in the storage 130, or in the requestprocessing engine 112, or in a communication path, or any othercomponent) that results in corruption or other loss of the data 132 (orportion of the data 132) or a failure in completing a write operation tothe storage 130.

In some examples, the replication logic 140 can be implemented with aportion of the hardware processing circuit(s) of the request processingengine 112, or alternatively, with machine-readable instructionsexecutable by the request processing engine 112. In other examples, thereplication logic 140 is separate from the request processing engine112.

In some examples, in response to write requests from the requester 108,the replication logic 140 can replicate data associated with the writerequests. Moreover, the replication logic 140 can log write metadataassociated with the write requests. Replicating data can refer tocreating a copy of a version of data, such as a copy of a write databeing written by a write request, a copy of a version of data prior to awrite operation for the write request, and so forth.

The replicated data is written by the replication logic 140 over a datapath 134 to a storage 110, which stores a journal 114 that contains logsof information associated with write requests. The data path 134 is forwriting logs of write operations to the journal 114. The data path 134is separate from a data path 133 for writes (initiated by requesters) tothe data 132 stored in the storage 130.

The journal 114 can include replicated data 114-1 and write metadata114-2 provided by the replication logic 140. “Replicated data” refers toa copy of a version of a portion of the data 132 in the storage 130. Thereplicated data 114-1 can include write data associated with writerequests. For example, the replicated data 114-1 can include datacheckpoints, where each data checkpoint can include a version of data ata respective point in time. Data checkpoints can be taken at respectivedifferent timepoints along a checkpoint timeline, in response to any ofvarious events: a change in data, a time event, and so forth. Thus, asthe data 132 in the storage 130 changes over time due to writes, thedata checkpoints stored in the replicated data 114-1 can includedifferent versions of the data at different timepoints. This allows datarecovery to any specific point in time in case of data loss.

More generally, the replicated data 114-1 in the journal 114 can includeother types of backup data that can be used for recovering lost data.

The write metadata 114-2 includes a log of write requests issued byrequesters (including the requester 108) to the storage system 100. Thewrite metadata 114-2 can include identifiers of storage volumes,locations (e.g., logical addresses) of the storage volumes in thestorage 130, timestamps associated with writes to the storage volumes,and/or other metadata. A “storage volume” includes a logical unit ofdata and can contain a portion of the data 132 in the storage 130. Inother examples, the write metadata 114-2 can refer to other types ofdata units, such as blocks, chunks, files, and so forth.

Although FIG. 1 shows the storage 110 as being separate from the storage130 that stores the data 132, in other examples, the journal 114 can bestored in the same storage as the data 132.

There may be multiple journals 114 in the storage 110, where eachjournal 114 can be associated with a respective requester (or group ofrequesters). Thus, data for different requesters (or groups ofrequesters) can be protected using different corresponding journals 114.

In some examples, the journal 114 can be used to recover data in case ofa failure or other fault in the system that results in loss of data(e.g., 132 in the storage 130).

The inline detector 102 is placed “inline” with a data stream in thedata path 134 between the request processing engine 112 and the storage110. The inline detector 102 is “inline” with the data path 134 if theinline detector 102 is able to receive in real time the replicated datathat is being written to the journal 114.

The inline detector 102 receives the replicated data of the data path134 in “real time” if the inline detector 102 receives the replicateddata within a predetermined amount time of the replicated data beingtransmitted from the request processing engine 112 to the storage 110,where the predetermined amount time can be less than 10 seconds, or 5seconds, or 1 second, or 100 milliseconds (ms), or 50 ms, or 10 ms, or 1ms, and so forth.

The inline detector 102 is able to process the replicated data that isprovided from the request processing engine 112 to the storage 110, todetect potential data encryption of the replicated data sent from therequest processing engine 112 to the storage 110. Note that encryptionof the replicated data sent from the request processing engine 112 tothe storage 110 can result from a malware attack (e.g., a ransomwareattack) in which the malware attempts to encrypt both the data 132 inthe storage 130 as well as any backup data, including the replicateddata in the journal 114.

The potential data encryption detected by the inline detector 102 maynot be data encryption caused by an unauthorized entity such as malware.For example, the potential data encryption may be data encryptionperformed by authorized entities, such as programs, machines, or users,as part of normal operations of a system (e.g., a computer, a storagesystem, a communication node, etc.). As another example, the potentialdata encryption detected by the inline detector 102 may not actually bedata encryption, but a change in data performed by a different type ofoperation that is authorized.

To confirm that the potential data encryption detected the by the inlinedetector 102 is in fact an unauthorized data encryption, a multi-stagedata encryption detection is performed. The multi-stage data encryptiondetection employs the inline detector 102 in the first stage, followedby analysis using the object analyzer 106 and/or the pattern analyzer104 in a further stage (or multiple further stages).

In response to detecting the potential data encryption, the inlinedetector 102 can send a potential data encryption indication (PDEI) 120to the pattern analyzer 104 and the object analyzer 106. The PDEI 120can be in the form of a message, an information element, a signal, orany other type of indicator. The PDEI 120 can be sent as a message overa network, a message or other indicator through an applicationprogramming interface (API), an inter-process interface, or any othertype of interface. The PDEI 120 sent to the pattern analyzer 104 and/orthe object analyzer 106 can trigger the pattern analyzer 104 and/or theobject analyzer 106 to perform further analysis to confirm that anunauthorized data encryption has occurred.

Inline Detector Analysis

The inline detector 102 applies a statistical analysis that usesobserved (absolute) data entropy to detect potential data encryption. Anexample of the observed data entropy is Shannon entropy, which is ameasure of the uncertainty or variability associated with a randomvariable. Shannon entropy quantifies the expected value of informationcontained in a message.

Data in the data path from the request processing engine 112 to thestorage 110 is sampled at various time intervals (e.g., periodically,randomly, or in response to specified events) as the data is streamed tothe storage 110. For each sample of K blocks, each block of size T, ameasure according to a Shannon entropy is computed for each block of theK blocks, where K 1, and T is a predefined value. In such examples,Shannon entropy quantifies the expected value of information containedin each block of the K blocks

The Shannon entropy measures for the K blocks are collected into a listof entropy measures. The list of entropy measures is used as an input toa statistical test (e.g., T-test, or another statistical test) todetermine whether the observed entropy measures match a list ofstatistically expected entropy measures for a strongly encrypted blockof size T with a target confidence. In other words, the expected entropymeasures are entropy measures that would be expected if data wereencrypted. The expected entropy values are precomputed ahead of time.

A T-test can be used if there is a significant difference between themeans of two groups of measures, which in some examples of the presentdisclosure include (1) the list of Shannon entropy measures computed forthe K blocks of a respective sample, and (2) the list of precomputedexpected entropy values. If the mean of the Shannon entropy measures inthe list of Shannon entropy measures computed for the K blocks issimilar to the mean of the precomputed expected entropy values to withina specified threshold, then that indicates that potential dataencryption has been detected by the inline detector 102. On the otherhand, if the mean of the Shannon entropy measures in the list of Shannonentropy measures computed for the K blocks differs from the mean of theprecomputed expected entropy values by greater than the specifiedthreshold, then that indicates no potential data encryption has beendetected by the inline detector 102.

In a further example, an incoming data stream for a write of an objectis sampled at random intervals. The inline detector 102 collects in abuffer N (N 2) bytes that are randomly sampled. The inline detector 102creates an empty histogram with a specified quantity (e.g., 256) ofbuckets, assuming all possible byte values may be present (one bucketper possible byte value). The inline detector 102 populates thehistogram with respective bucket counts of the byte values from thesample. The inline detector 102 converts each bucket count into aprobability of appearance of the byte in the buffer by dividing thebucket count by N. The conversion produces a probability vector, whereeach entry of the vector is a probability of occurrence of thecorresponding byte value.

The inline detector 102 calculates an entropy of the resultingprobability vector using Shannon's formula. The inline detector 102collects the resulting value into an entropy vector. After a predefinednumber of entropy values have been collected into the entropy vector,the entropy vector is used as an input into a T-test, to establishwhether it is statistically significant to assume that the resultingentropy vector has been drawn out of an encrypted data population.

In other examples, instead of comparing means, other statisticalcomparisons of the list of Shannon entropy measures computed for the Kblocks and the list of precomputed expected entropy values can beperformed.

Once N (N≥1 or 2) consecutive samples (where each sample has K blocks ofdata) test positive for encryption using the statistical test as notedabove, the PDEI 120 can be set, and the inline detector 102 can send thePDEI 120 to the pattern analyzer 104 and/or the object analyzer 106.

The variables K, T, and N are tunable parameters that can be derivedfrom empirical or experimental data. In other examples, the variablescan be adjusted using machine learning as data patterns are observed.

Object Analyzer Analysis

In response to receiving the PDEI 120, the object analyzer 106 canperform an independent assessment of whether the potential dataencryption indicated by the inline detector 102 constitutes unauthorizeddata encryption. The analysis performed by the object analyzer 106 isbased on multiple objects, including a first object 150 and a secondobject 152. The first and second objects 150 and 152 represent differentversions of data in the journal 114. The second object 152 represents aversion of data that is prior in time to the first version of datarepresented by the first object 150. In some examples, the first andsecond objects 150 and 152 are different snapshots of data in thejournal 114. A “snapshot” of data refers to a copy of data that wasgenerated at a respective time.

For example, the second version of the data represented by the secondobject 152 is a version prior to the potential data encryption indicatedby the inline detector 102, while the first version of data representedby the first object 150 is after the potential data encryption indicatedby the inline detector 102. In other words, the first object 150contains data that may potentially be corrupted, while the second object152 contains data that has not been potentially corrupted by thepotential data encryption.

In some examples, the object analyzer 106 may create the first object150 or request that the first object 150 be created (such as by taking asnapshot of a portion of the replicated data 114-1 in the journal 114),in response to the PDEI 120 from the inline detector 102.

An “object” can refer any separately identifiable unit of data in thejournal 114. For example, an object can include a data checkpoint. Inexamples where the journal 114 includes multiple data checkpoints, thefirst and second objects 150 and 152 include different data checkpointsalong a checkpoint timeline.

The object analyzer 106 compares a suspect object (e.g., the firstobject 150) flagged by the inline detector 102 to a prior object (e.g.,the second object 152).

Although reference is made to comparing two objects, it is noted thatthe object analyzer 106 can compare more than two objects in otherexamples. For example, the object analyzer 106 can compare a first groupof objects containing data after the potential data encryption with asecond group of objects containing data prior to the potential dataencryption.

The object analyzer 106 can use statistical and/or machine learningtechniques. An example of a statistical technique includes performing arelative entropy calculation between the two objects to measure adifference between the two versions of data represented by the first andsecond objects 150 and 152. An example of relative entropy isKullback-Leibler divergence, which represents statistical distance thatmeasures how a first probability distribution is different from a secondprobability distribution.

An incoming data stream for a write of an object is sampled at randomintervals. The object analyzer 106 collects in a buffer N (N 2) bytesthat are randomly sampled. The object analyzer 106 creates an emptyhistogram with a specified quantity (e.g., 256) of buckets, assuming allpossible byte values may be present (one bucket per possible bytevalue). The object analyzer 106 populates the histogram with respectivebucket counts of the byte values from the sample. The object analyzer106 converts each bucket count into a probability of appearance of thebyte in the buffer by dividing the bucket count by N. The conversionproduces a probability vector, where each entry of the vector is aprobability of occurrence of the corresponding byte value. Theprobability vector is an example of a probability distribution. Twoprobability vectors are computed for the two objects with respect towhich the relative entropy is to be calculated.

A divergence (relative entropy) between the first and second probabilitydistributions (e.g., first and second probability vectors) isdetermined. If the divergence (relative entropy) exceeds a specifiedthreshold, then the object analyzer 106 can output an indication thatunauthorized data encryption is present. This indication is depicted asan indication of unauthorized data encryption 160 in FIG. 1 .

The indication of unauthorized data encryption 160 can be transmitted bythe object analyzer 106 to a target entity, such as to an administrator,to a program, or to a machine. Note that in some examples, the encryptedobject 150 (or an identifier of the encrypted object 150) may also betransmitted to the target entity. On the other hand, if the divergence(relative entropy) does not exceed the specified threshold, then thatindicates that unauthorized data encryption has not occurred, and theobject analyzer 106 may cause removal or deletion of the first object150. In response to the indication of unauthorized data encryption 160from the object analyzer 106, the target entity can initiate aremediation action to counter the unauthorized data encryption (whichcan be due to a ransomware attack). The remediation action can includeshutting down the storage system 100, disabling network communicationwith the storage system 100, and so forth.

The relative entropy calculation indicates whether an increase ofentropy per object is evident, which indicates that data encryption hasoccurred.

As the relative entropy calculation is computationally intensive, thecalculation can be performed by the object analyzer 106 on a systemseparate from the storage system 100.

In further examples, the object analyzer 106 can apply a hash functionon the first and second objects 150 and 152. The hash function appliedon the first object 150 produces a first hash value, and the hashfunction applied on the second object 152 produces a second hash value.The object analyzer 106 compares the first and second hash values. Thedifference between the hash values of the two different objects providesan indication of a “distance” between the data versions represented bythe first and second objects 150 and 152, where a larger distance (e.g.,larger difference in hash values) indicates that more change hasoccurred, which is indicative of unauthorized data encryption. Forexample, if the first and second hash values differ by greater than aspecified threshold, then that indicates unauthorized data encryptionhas occurred. If the first and second hash values do not differ bygreater than the specified threshold, then that indicates unauthorizeddata encryption has not occurred.

In other examples, the first and second objects 150 and 152 are fed asinputs to a machine learning model. The machine learning model can betrained using training data that includes objects subject tounauthorized data encryption, and objects not subject to unauthorizeddata encryption. The objects in the training data are labelled asencrypted or not encrypted, so that the machine learning model can learnhow an encrypted object differs from an unencrypted object. The trainedmachine learning model can produce an output based on the first andsecond objects 150 and 152, where the output includes an indication ofwhether or not unauthorized data encryption has occurred.

Pattern Analyzer Analysis

In response to receiving the PDEI 120, the pattern analyzer 104 canperform an independent assessment (in addition to or instead of theanalysis by the object analyzer 106) of whether the potential dataencryption indicated by the inline detector 102 constitutes unauthorizeddata encryption.

The pattern analyzer 104 does not analyze the actual data, but rather,analyzes a write I/O pattern that can be discerned from the writemetadata 114-2 in the journal 114. The pattern analyzer 104 candetermine if a write I/O pattern deviates from a baseline write I/Opattern 154 by more than a specified threshold. For example, relativeentropy such as a Kullback-Leibler divergence can be computed betweenthe write I/O pattern determined from the write metadata 114-2 and thebaseline write I/O pattern 154. If a measure of the divergence isgreater than the specified threshold, then the pattern analyzer 104 canindicate presence of an unauthorized data encryption. The patternanalyzer 104 can send an indication of the unauthorized data encryption162 to a target entity, which can initiate a remediation action. Inother examples, the pattern analyzer 104 may use other indications ofdifferences between a write I/O pattern determined from the writemetadata 114-2 and the baseline write I/O pattern 154 to detect whetherunauthorized data encryption is present.

The baseline write I/O pattern 154 may be derived by an entity (e.g., ahuman, a program, or a machine) based on historical write operations tothe storage 130 (and/or to any other storage or group of storages).

A write I/O pattern can refer to any collection of write operationsperformed with respect to storage volumes (or other types of data units)in the storage 130, where the collection of write operations exhibits aspatial and/or a temporal pattern, or any other type of pattern. Forexample, the collection of write operations can be made with respect tocertain storage volumes as identified by storage volume identifiers andlocations in the write metadata 114-2. The pattern analyzer 104 mayderive a first spatial pattern of write operations with respect tostorage volumes in the storage 130 based on the write metadata 114-2 inthe journal 114. The baseline write I/O pattern 154 may indicate asecond spatial pattern. If the first spatial pattern and the secondspatial pattern have a divergence that exceeds a threshold, then thepattern analyzer 104 may output an indication of unauthorized dataencryption.

As a further example, the write operations in the collection of writeoperations can perform writes to storage volumes that have temporalcharacteristics according to timestamps associated with the writeoperations (e.g., the timestamps of write requests are stored in thewrite metadata 114-2). As examples, the timestamps in the write metadata114-2 may indicate that writes to storage volumes within a given groupof storage volumes may occur on average a first difference in time ΔT1apart. The baseline write I/O pattern 154 may indicate that,historically, that writes to storage volumes within the given group ofstorage volumes (and/or other group(s) of storage volumes) may occur onaverage a second difference in time ΔT2 apart. If ΔT1 is less than ΔT2by some specified time difference threshold, then that may indicate thatunauthorized data encryption is occurring since data encryption is beingperformed to storage volumes at a greater frequency than normal.

In other examples, the pattern analyzer 104 can use both differences intemporal patterns and spatial patterns in determining whetherunauthorized data encryption is present.

In some examples, the pattern analyzer 104 is applied to the journal114, rather than to data in-flight (in real time) through the data path134 between the request processing engine 112 and the storage 110.Applying pattern analysis on the data in real time can be costly.However, since the journal 114 logs all write operations, the patternanalyzer 104 can perform its analysis of the journal 114, which is notpart of the time-sensitive data path.

In further examples, the pattern analyzer 104 can use machine learningto determine whether or not a write I/O pattern indicated by the writemetadata 114-2 is indicative of unauthorized data encryption. A machinelearning model can be trained using training data to learn write I/Opatterns that are indicative of unauthorized data encryption. Note thatthe machine learning model does not look at the write data, but rather,analyzes the pattern (spatial and/or temporal) of write operations tothe storage 130.

Further Examples

FIG. 2 shows another example arrangement according to further examplesof the present disclosure. A system 200 includes VMs 202 that execute inthe system 200. The system 200 can be implemented using a computer or acollection of computers.

Each VM contains an application program (or multiple applicationprograms) and a guest operating system (OS) (not shown). The applicationprograms in the VMs 202 and/or the guest OSes can issue requests toaccess data in the storage 130. In the example of FIG. 2 , a VM 202 isan example of the requester 108 of FIG. 1 . Each VM 202 is able to issueaccess requests (read or write requests) for the data 132 in the storage130.

The system 200 also includes a VMM 204 (also referred to as ahypervisor) which virtualizes physical resources (including processingresources, storage resources, input/output (I/O) resources,communication resources, etc.) of the system 200, to make such physicalresources available to the VMs 202.

In some examples, an access request for the data 132 in the storage 130issued from a VM 202 is intercepted by the VMM 204, which manages theaccess of the data 132 in storage 130.

The system 200 also includes a virtual replication appliance (VRA) 206,which is an example of the replication logic 140 of FIG. 1 . In someexamples, the VRA 206 can also be a VM that performs tasks similar tothose of the replication logic 140 of FIG. 1 . The VRA 206 manages thereplication of data from protected VMs 202 to the journal 114 in thestorage 110. A “protected VM” can refer to a VM in the system 200 whosedata is protected from loss by the VRA 206. Note that some of the VMs202 may not be protected VMs, in which case the VRA 206 does notreplicate data for such unprotected VM(s).

A data path 208 is provided between the VRA 206 and the storage 110 overwhich replicated data is provided to the storage 110. In some examples,if the VRA 206 is a VM, then the data path 208 also includes a paththrough the VMM 204. The journal 114 contains similar content as thejournal in FIG. 1 (the replicated data 114-1 and the write metadata114-2).

Similar to FIG. 1 , the inline detector 102 is provided to performinline analysis of a stream of data over the data path 208 to detectpotential data encryption. In response to detecting potential dataencryption, the inline detector 102 issues the PDEI 120 to the objectanalyzer 106 and the pattern analyzer 104.

FIG. 2 further depicts a protection manager 210. The protection manager210 can execute in a system that is separate from the system 200. Insome examples, the protection manager 210 is responsible for disasterrecovery of data in case of loss of data in the system 200. For example,the protection manager 210 can perform replication of objects, includingthe objects 150 and 152, from the journal 114 to a storage site that isremote from the system 200. In some examples, the protection manager 210can take snapshots of the content of the journal 114. The objects 150and 152 are snapshots, for example.

In response to the PDEI 120, the object analyzer 106 can performanalysis of the objects 150 and 152, and the pattern analyzer 104 canperform I/O pattern analysis, as discussed above in connection with FIG.1 .

FIG. 3 is a block diagram of a non-transitory machine-readable orcomputer-readable storage medium 300 storing machine-readableinstructions that upon execution cause a system to perform varioustasks. The system can include a computer or a collection of computers.

The machine-readable instructions include inline detection instructions302 to apply an inline detection of a write of data in a storage (e.g.,130 in FIG. 1 or 2 ), the inline detection to detect potential dataencryption of the data. In some examples, the inline detection isapplied to writes to a journal (e.g., 114 in FIG. 1 or 2 ) that logswrites in a storage system. In some examples, writes to the journal areover a data path (e.g., 134 in FIG. 1 or 208 in FIG. 2 ) that isseparate from a data path (e.g., 133 in FIG. 1 or 2 ) for the writesbetween one or more requesters and the storage system. In some examples,the inline detection is based on calculation of an absolute entropy inwrite data.

The machine-readable instructions include first object creationinstructions 304 to, in response to an indication of the potential dataencryption, create a first object that represents a first version of thedata.

The machine-readable instructions include object analysis applicationinstructions 306 to apply a further analysis to determine whether thepotential data encryption constitutes unauthorized data encryption, thefurther analysis based on the first object and a second object thatrepresents a second version of the data that is prior to the firstversion of the data.

In some examples, the creating of the first object includes creating afirst snapshot, and the second object includes a second snapshot createdprior to the first snapshot.

In some examples, the further analysis is based on calculation ofrelative entropy on the first object and the second object.

In some examples, the further analysis is based on calculation of hashesof the first object and the second object.

In some examples, the further analysis is based on machine learning thatproduces an indication of the unauthorized data encryption based on thefirst object and the second object.

In some examples, the machine-readable instructions further apply an I/Opattern analysis on writes including the write, to identify whether thepotential data encryption constitutes unauthorized data encryption. Insome examples, the I/O pattern analysis compares a pattern of the writesto a baseline write I/O pattern derived from historical writeoperations.

FIG. 4 is a block diagram of a system 400 according to some examples.The system 400 includes a hardware processor 402 (or multiple hardwareprocessors). A hardware processor can include a microprocessor, a coreof a multi-core microprocessor, a microcontroller, a programmableintegrated circuit, a programmable gate array, or another hardwareprocessing circuit.

The system 400 includes a storage medium 404 storing machine-readableinstructions executable on the hardware processor 402 to perform varioustasks. Machine-readable instructions executable on a hardware processorcan refer to the instructions executable on a single hardware processoror the instructions executable on multiple hardware processors.

The machine-readable instructions in the storage medium 404 includeinline detection instructions 406 to apply, using an inline detector, aninline detection of data in a data path to a journal that logs writes toa storage. The inline detection is to detect potential data encryption.

The machine-readable instructions in the storage medium 404 includepotential data encryption indication sending instructions 408 to send,from the inline detector to an object analyzer, an indication ofpotential data encryption (e.g., the PDEI 120 in FIG. 1 or 2 ).

The machine-readable instructions in the storage medium 404 includeobject analysis instructions 410 to, in response to the indication ofpotential data encryption, apply, using the object analyzer, a furtheranalysis to determine whether the potential data encryption constitutesunauthorized data encryption. The further analysis is based on a firstobject and a second object, the first object representing a firstversion of data after occurrence of the potential data encryption, andthe second object representing a second version of data that is prior tothe occurrence of the potential data encryption.

FIG. 5 is a flow diagram of a process 500 according to some examples ofthe present disclosure. The process 500 can be performed by a systemincluding a hardware processor, where the system can include a computeror multiple computers.

The process 500 includes applying (at 502), using an inline detector, aninline detection of replicated data in a first data path to a journalthat logs writes to a storage, the inline detection to detect potentialdata encryption, where the replicated data is of data being written overa separate second data path to the storage.

The process 500 includes sending (at 504), from the inline detector toan object analyzer, an indication of potential data encryption.

In response to the indication of potential data encryption, the process500 includes applying (at 506), using the object analyzer, a furtheranalysis to determine whether the potential data encryption constitutesunauthorized data encryption, the further analysis based on a firstobject and a second object, the first object representing a firstversion of data of a write after occurrence of the potential dataencryption, and the second object representing a second version of thedata of the write that is prior to the occurrence of the potential dataencryption.

A storage medium (e.g., 300 in FIG. 3 or 404 in FIG. 4 ) can include anyor some combination of the following: a semiconductor memory device suchas a dynamic or static random access memory (a DRAM or SRAM), anerasable and programmable read-only memory (EPROM), an electricallyerasable and programmable read-only memory (EEPROM) and flash memory; amagnetic disk such as a fixed, floppy and removable disk; anothermagnetic medium including tape; an optical medium such as a compact disk(CD) or a digital video disk (DVD); or another type of storage device.Note that the instructions discussed above can be provided on onecomputer-readable or machine-readable storage medium, or alternatively,can be provided on multiple computer-readable or machine-readablestorage media distributed in a large system having possibly pluralnodes. Such computer-readable or machine-readable storage medium ormedia is (are) considered to be part of an article (or article ofmanufacture). An article or article of manufacture can refer to anymanufactured single component or multiple components. The storage mediumor media can be located either in the machine running themachine-readable instructions, or located at a remote site from whichmachine-readable instructions can be downloaded over a network forexecution.

In the foregoing description, numerous details are set forth to providean understanding of the subject disclosed herein. However,implementations may be practiced without some of these details. Otherimplementations may include modifications and variations from thedetails discussed above. It is intended that the appended claims coversuch modifications and variations.

What is claimed is:
 1. A non-transitory machine-readable storage mediumcomprising instructions that upon execution cause a system to: apply aninline detection of a write of data in a storage, the inline detectionto detect potential data encryption of the data; in response to anindication of the potential data encryption, create a first object thatrepresents a first version of the data; and apply a further analysis todetermine whether the potential data encryption constitutes unauthorizeddata encryption, the further analysis based on the first object and asecond object that represents a second version of the data that is priorto the first version of the data.
 2. The non-transitory machine-readablestorage medium of claim 1, wherein the creating of the first objectcomprises creating a first snapshot, and the second object comprises asecond snapshot created prior to the first snapshot.
 3. Thenon-transitory machine-readable storage medium of claim 1, wherein theinline detection is applied to writes to a journal that logs writes in astorage system.
 4. The non-transitory machine-readable storage medium ofclaim 3, wherein the writes to the journal are over a data path that isseparate from a data path for the writes between one or more requestersand the storage system.
 5. The non-transitory machine-readable storagemedium of claim 3, wherein the journal comprises replicated data thatcomprises versions of data at different timepoints, and wherein thefirst object and the second object are based on the replicated data inthe journal.
 6. The non-transitory machine-readable storage medium ofclaim 1, wherein the inline detection is based on calculation of anentropy in write data.
 7. The non-transitory machine-readable storagemedium of claim 6, wherein the entropy comprises an absolute entropy. 8.The non-transitory machine-readable storage medium of claim 1, whereinthe further analysis is based on calculation of relative entropy on thefirst object and the second object.
 9. The non-transitorymachine-readable storage medium of claim 1, wherein the further analysisis based on calculation of hashes of the first object and the secondobject.
 10. The non-transitory machine-readable storage medium of claim1, wherein the further analysis is based on machine learning thatproduces an indication of the unauthorized data encryption based on thefirst object and the second object.
 11. The non-transitorymachine-readable storage medium of claim 1, wherein the instructionsupon execution cause the system to: apply an input/output (I/O) patternanalysis on writes including the write, to identify whether thepotential data encryption constitutes unauthorized data encryption. 12.The non-transitory machine-readable storage medium of claim 11, whereinthe I/O pattern analysis compares a pattern of the writes to a baselinewrite I/O pattern derived from historical write operations.
 13. Thenon-transitory machine-readable storage medium of claim 1, wherein theinline detection is applied to writes to a journal that logs writes in astorage system, wherein the journal includes write metadata relating tothe writes, and wherein the instructions upon execution cause the systemto: derive a write I/O pattern based on the write metadata; andidentify, based on the write I/O pattern, whether the potential dataencryption constitutes unauthorized data encryption.
 14. Thenon-transitory machine-readable storage medium of claim 11, wherein theI/O pattern analysis is not based on write data of the writes.
 15. Asystem comprising: a hardware processor; and a non-transitory storagemedium storing instructions executable on the hardware processor to:apply, using an inline detector, an inline detection of data in a datapath to a journal that logs writes to a storage, the inline detection todetect potential data encryption; send, from the inline detector to anobject analyzer, an indication of potential data encryption; and inresponse to the indication of potential data encryption, apply, usingthe object analyzer, a further analysis to determine whether thepotential data encryption constitutes unauthorized data encryption, thefurther analysis based on a first object and a second object, the firstobject representing a first version of data after occurrence of thepotential data encryption, and the second object representing a secondversion of data that is prior to the occurrence of the potential dataencryption.
 16. The system of claim 15, wherein the instructions areexecutable on the hardware processor to: send, from the inline detectorto a pattern analyzer, the indication of potential data encryption; andin response to the indication of potential data encryption, apply, usingthe pattern analyzer, an input/output (I/O) write pattern analysis todetermine whether the potential data encryption constitutes unauthorizeddata encryption.
 17. The system of claim 16, wherein the object analysisby the object analyzer is based on content of write data, and the I/Owrite pattern analysis by the pattern analyzer is not based on thecontent of write data.
 18. The system of claim 15, wherein the inlinedetector is to calculate a measure of absolute entropy to detect thepotential data encryption, and the object analyzer is to calculate ameasure of relative entropy to determine whether the potential dataencryption constitutes unauthorized data encryption.
 19. A method of asystem comprising a hardware processor, comprising: applying, using aninline detector, an inline detection of replicated data in a first datapath to a journal that logs writes to a storage, the inline detection todetect potential data encryption, wherein the replicated data is of databeing written over a separate second data path to the storage; sending,from the inline detector to an object analyzer, an indication ofpotential data encryption; and in response to the indication ofpotential data encryption, applying, using the object analyzer, afurther analysis to determine whether the potential data encryptionconstitutes unauthorized data encryption, the further analysis based ona first object and a second object, the first object representing afirst version of data of a write after occurrence of the potential dataencryption, and the second object representing a second version of thedata of the write that is prior to the occurrence of the potential dataencryption.
 20. The method of claim 19, further comprising: sending,from the inline detector to a pattern analyzer, the indication ofpotential data encryption; and in response to the indication ofpotential data encryption, applying, using the pattern analyzer, aninput/output (I/O) write pattern analysis to determine whether thepotential data encryption constitutes unauthorized data encryption.